Overview
Can your model unlock the secrets of polymers? In this competition, you're tasked with predicting the fundamental properties of polymers to speed up the development of new materials. Your contributions will help researchers innovate faster, paving the way for more sustainable and biocompatible materials that can positively impact our planet.

Start

24 days ago
Close
2 months to go
Merger & Entry
Description
Polymers are the essential building blocks of our world, from the DNA within our bodies to the plastics we use every day. They are key to innovation in critical fields like medicine, electronics, and sustainability. The search for the next generation of groundbreaking, eco-friendly materials is on, and machine learning can be the solution. However, progress has been stalled by one major hurdle: a critical lack of accessible, high-quality data.

Our Open Polymer Prediction 2025 introduces a game-changing, large-scale open-source dataset – ten times larger than any existing resource. We invite you to piece together the missing links and unlock the vast potential of sustainable materials.

Your mission is to predict a polymer's real-world performance directly from its chemical structure. You'll be provided with a polymer's structure as a simple text string (SMILES), and your challenge is to build a model that can accurately forecast five key metrics that determine how it will behave. This includes predicting its density, its response to heat thermal conductivity(Tc) and glass transition temperature(Tg), and its fundamental molecular size and packing efficiency radius of gyration(Rg) and fractional free volume(FFV). The ground truth for this competition is averaged from multiple runs of molecular dynamics simulation.

Your contributions have the potential to redefine polymer discovery, accelerating sustainable polymer research through virtual screening and driving significant advancements in materials science.

Evaluation
The evaluation metric for this contest is a weighted Mean Absolute Error (wMAE) across five polymer properties, defined as:


where 
 is the set of polymers being evaluated, and 
 is the set of property types for a polymer 
. The term 
 is the predicted value and 
 is the true value of the 
-th property.

To ensure that all property types contribute equally regardless of their scale or frequency, we apply a reweighting factor 
 to each property type:

where 
 is the number of available values for the 
-th property, and 
 is the estimated value range of that property type based on the test data. 
 is the total number of tasks. This design has three goals:

Scale normalization: Division by the range 
 ensures that properties with larger numerical ranges do not dominate the error.
Inverse square-root scaling: To prevent some properties from being overlooked, the term 
 assigns a disproportionately high weight to rare properties with fewer samples.
Weight normalization: The second term is normalized so that the sum of weights across all 
 properties is 
.
Submission File
The submission file for this competition must be a csv file. For each id in the test set, you must predict a value for each of the five chemical properties. The file should contain a header and have the following format.

   id,Tg,FFV,Tc,Density,Rg
   2112371,0.0,0.0,0.0,0.0,0.0
   2021324,0.0,0.0,0.0,0.0,0.0
   343242,0.0,0.0,0.0,0.0,0.0


Prizes
1st Place: $12,000
2nd Place: $10,000
3rd Place: $10,000
4th Place: $8,000
5th Place: $5,000
Top Student Group: $5,000 to the highest performing student team. A team would be considered a student team if majority members (e.g., at least 3 out of a 5 member team) are students enrolled in a high school or university degree. In the case of an even number of members, half of them must be students. A google form requesting proof materials will be shared at the end of the competition and Top Student Group will be selected from the application.

Code Requirements

Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:

CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv
Please see the Code Competition FAQ for more information on how to submit. And review the code debugging doc if you are encountering submission errors.

Citation
Gang Liu, Jiaxin Xu, Eric Inae, Yihan Zhu, Ying Li, Tengfei Luo, Meng Jiang, Yao Yan, Walter Reade, Sohier Dane, Addison Howard, and María Cruz. NeurIPS - Open Polymer Prediction 2025. https://kaggle.com/competitions/neurips-open-polymer-prediction-2025, 2025. Kaggle.

Dataset Description
In this competition, your task is to use polymer structure data (SMILES) to predict five key chemical properties derived from molecular dynamics simulation: glass transition temperature (Tg), fractional free volume (FFV), thermal conductivity (Tc), polymer density, and radius of gyration (Rg). Successfully predicting these properties is crucial for scientists to accelerate the design of novel polymers with targeted characteristics, which can be used in various applications.

This competition uses a hidden test set. When your submitted notebook is scored, the actual test data will be made available to your notebook. Expect approximately 1,500 polymers in the hidden test set.

Files
train.csv

id - Unique identifier for each polymer.
SMILES - Sequence-like chemical notation of polymer structures.
Tg - Glass transition temperature (
.
FFV - Fractional free volume.
Tc - Thermal conductivity (
).
Density - Polymer density (
).
Rg - Radius of gyration (
Å
).
test.csv

id - Unique identifier for each polymer.
SMILES - Sequence-like chemical notation of polymer structures.
sample_submission.csv

A sample submission in the correct format.
train_supplement

dataset1.csv - Tc data from the host’s older simulation results
dataset2.csv - SMILES from this Tg table. We are only able to provide the list of SMILES.
dataset3.csv - data from the host’s older simulation results
dataset4.csv - data from the host’s older simulation results

